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Abstract — We determine the capacity of compound classical- 
quantum channels. As a consequence we obtain the capacity for- 
mula for the averaged classical-quantum channels. The capacity 
result for compound channels demonstrates, as in the classical 
setting, the existence of reliable universal classical-quantum codes 
in scenarios where the only a priori information about the 
channel used for the transmission of information is that it belongs 
to a given set of memoryless classical-quantum channels. Our 
approach is based on a universal classical approximation of the 
quantum relative entropy which in turn relies on a universal 
hypothesis testing result. 

Index Terms — Compound quantum channels, averaged quan- 
tum channels, coding theorem, capacity, universal quantum codes 



I. Introduction 

In this paper we present the coding theorems for compound 
and averaged channels with classical input and quantum output 
(cq-channels). The result nicely supplements recent results of 
Datta and Dorlas [6] where they considered finite weighted 
sums of memoryless quantum channels and determined their 
classical capacity. This is one of the basic examples of 
channels with long-term memory. This is obviously equivalent 
to the determination of the classical capacity for the associated 
compound channel consisting of finitely many channels, since 
for finite sums we can easily bound the error probabilities of 
the individual memoryless branches by the error probability 
of the averaged channel and vice versa. Unfortunately, the 
beautiful method of proof in [6] does not apply when the 
number of channels is infinite. 

Roughly, the interest in compound channels is motivated by 
the fact that in many situations we have only a limited knowl- 
edge about the channel which is used for the transmission of 
information. In the compound setting we know merely that 
the memoryless cq-channel which is in use belongs to some 
given finite or infinite set of memoryless cq-channels which is 
a priori known to the sender and receiver Their goal is to con- 
struct coding-decoding strategies that work well for the whole 
set of channels simultaneously. The situation is comparable 
with the universal source coding scenario considered in [17] 

This work is supported by the Deutsche Forschungsgemeinschaft 
DFG via project Bj 57/1-1 "Entropie und Kodierung groBer Quanten- 
Informationssysteme". 



by Jozsa and M., P., and R. Horodecki. Averaged cq-channels 
are close relatives of compound channels, the difference being 
that in this situation the communicating parties have access 
to an additional a priori probability distribution governing the 
appearance of the particular member of the compound channel. 
The paper is organized as follows: In Section |ll] we give a 
rapid overview of the classical theory of compound channels. 
Whereas Section HUl is devoted to the notion of compound cq- 
channels and the definition of the capacity for this class of 
channels. The subsequent Section |IV] contains the first pillar 
of our argument. Namely, we construct, using an idea going 
back to Nagaoka, a universal classical approximation of the 
quantum relative entropy for classes of uncorrected quantum 
states. The central Section IV] starts with a relation between a 
minimization procedure arising in universal hypothesis testing 
and the minimization process required for the determination 
of the capacity of compound cq-channels which is based 
on Donald's inequality (cf. Lemmata 15.11 and 15.31) . Then we 
proceed with the direct and the (strong) converse part of the 
coding theorem for compound cq-channelfl As a by-product 
we can prove in Section |Vl] the coding theorem and the weak 
converse for arbitrary averaged cq-channels with memoryless 
branches. This extends, in part, the results of Ahlswede [2] to 
the cq-situation. Moreover, the results of Datta and Dorlas [6] 
are generalized to averages of memoryless cq-channels with 
respect to arbitrary probability measures, provided the set of 
channels has some appropriate measurable structure. 

A. Notation 

We will assume tacitly throughout the paper that all Hilbert 
spaces are over the field C. The identity operator acting on a 
Hilbert space H is denoted by 1^ or simply by 1 if it is clear 
from the context which Hilbert space is under consideration. 
The set of density operators acting on the finite-dimensional 
Hilbert space H is denoted by S{T-C) and the set of probability 
distributions on a finite set A will be abbreviated by ViA). 
\A\ denotes the cardinality of the set A. The projection onto 

'After the submission of this paper Hayashi [12] obtained a similar result 
via Weyl-Shur duality. His result can be used to give another proof of the 
direct part of the coding theorem for averaged channels. His error bounds are 
exponenial but depend on the channel. 
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the range of a density operator p G S{H), diniTi < oo, is 
called the support of p and we dedicate the notation supp(/3) 
to it. 

The relative entropy of the state (i.e. density operator) p with 
respect to the state a is given by 



S{p\\a) 



tr(/3log(0 — ploga) if supp(/9) < supp((T) 
oo else 



where tr stands for the trace and log is the binary loga- 
rithm. The classical analog of the relative entropy known as 
Kullback-Leibler distance is defined by 

DlpUn) I EaeAP(a)logp(a) -p(a)logg(a) if p < q 
^ " ' [ oo else 

where p,q E ViA). The relation p ^ q means that q{a) — 
for some a E A implies p{a) = or, equivalently, that 
supp(p) C supp((7), where supp(p) {a E A : p{a) > 0}. 
Von Neumann entropy of a density operator p E S{'H), 
dimTY < oo, is defined to be S{p) := —U{p\og p). The 
Shannon entropy of p e ViA), \A\ < oo, is given by 
Hip) ■■= - ExeAPi^) ^ogp{x). 

The n-fold Cartesian product of a finite set A with itself is 
denoted by A". We set x" :— (xi, . . . , a;„) for sequences 
{xu...,x„) E A'\ 

Notation we use for the logarithms is as follows: log^ is the 
logarithm to the base a > 1 and log is understood as logj. 

II. Short Overview of the Classical Theory of 
Compound Channels 

The basic classical theory of compound channels was 
developed independently by Blackwell, Breiman, Thomasian 
[4] and Wolfowitz [24]. Blackwell, Breiman and Thomasian 
proved the coding theorem with the weak converse. Wolfowitz, 
on the other hand, obtained the coding theorem with the 
strong converse for the maximum error criterion by an entirely 
different method of proof. We recall at this place briefly 
the capacity formula just to emphasize the similarity to the 
capacity formula © for the cq-case. 

For an arbitrary set T and finite sets A, B we consider the 
family of discrete channels Wt : A ^ B, t E T. The 
compound channel, denoted by T, is simply the whole family 
of discrete memoryless channels {WJ^}teT.nen- 
Let A E (0,1). An (n, Af„, A)max-code for the compound 
channel T is set of tuples {x'^{i),Bi)^^^ where E A", 

B, C B", B, nBj =9 for i ^ j and 

W,"{B,\x"{i)) > 1-A 

for all i = 1,...,M„ and all t E T. A similar definition 
of the (n, M„, A)av-codes can be given simply by replacing 
the maximum error criterion by the average one. Thus the 
goal is to find reliable codes which work well for all discrete 
memoryless channels indexed by the set T. 
The work [4], [24] can be summarized as follows: The weak 
capacity of the compound channel T with respect to both the 
maximum and average error criteria is given by 



where denotes the set of probability distributions on 

A and I{p, Wt ) is the mutual information of the channel Wt 
with respect to the input distribution p. Wolfowitz has shown 
that the RHS of ([T]i is the strong capacity with respect to the 
maximum error criterion. Ahlswede gives an example in [1] 
that demonstrates that, surprisingly, the strong converse need 
not hold for compound channels if the average probability of 
error is used in the definition of the capacity. 

III. Compound CQ- Channels 

We consider here a set of cq-channels Wt A 3 x i-^ 
Dt,x G S{H), t E T, for an arbitrary set T where A is a 
finite set and ?i is a finite-dimensional Hilbert space. The 
n-th memoryless extension of the cq-channel Wt is given by 
Wt"{x") := A.x" := A,.i ® . . . ® A,.„ for x" E A". 
The compound cq-channel is given by the family 
{Wj"}tgT,neN- We will write simply T for the compound 
cq-channel. 

An n-code, n E N, for the compound cq-channel T is a family 
Cn ■— {x" {i) , bi)ffjl consisting of sequences a;"(i) E A" 
and positive semi-definite operators bi E B{Ti)®^ such that 
Y.i=i < 1®"- The number M,, is called the size of the 
code. 

A code Cn is called a (n, M„, A)max-code for the compound 
cq-channel T if the size of C„ is M„, a;"(i) E A" and if 



e„,{t,Cn) := ^_max^ (1 - ^{Dt,x^{^)bi)) < A V< G T. 

(2) 

with an analog definition of an (n, M„, A)av-code w.r.t average 
error probability criterion, i.e. we replace em{t,Cn) < A by 

ea{t,Cn) — V(l - tr(A,."W^»)) ^ ^ ^ T 



1 = 1 



in the definition. 

Thus an (n, A/„, A)max-code for the compound channel T 
ensures that the maximal error probability for all channels 
of class T is bounded from above by A. A more intuitive 
description of the compound channel is that the sender and 
receiver actually don't know which channel from the set T 
is used during the transmission of the ?i-block. Their prior 
knowledge is merely that the channel is memoryless and 
belongs to the set T. This is a channel analog of the universal 
source coding problem for a set of memoryless sources (cf. 
[17]). 

A real number i? > is said to be an achievable rate for the 
compound channel if there is a sequence of codes (C„)„gn of 
sizes Mn such that 



and 



liminf ilogA/„ > R, 

n^oo n 



lim supe(t, C„) — 0. 



(3) 
(4) 



cm = max miUp.Wt), 
peV{A)teT 



(1) 



The weak capacity, denoted by C{T), of the compound 
channel T is defined as the least upper bound of all achievable 
rates. 

i? > is called a X-achievable rate for the compound channel 
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T, A e [0, 1), if there is a sequence of codes (C„)„gN of sizes 
M„ for which ^ holds but the error condition is relaxed to 

supe(t,C„)<A VneN. 

tGT 

The X-capacity C{T, A) is the least upper bound of all A- 
achievable rates. 

The Holevo information of a cq-channel Wt A ^ S{H) with 
respect to the input distribution p e ViA) is defined by 

X{p, Wt) := SiDt) - Pi^)S{Dt,.) (5) 

where S{-) stands for von Neumann entropy. 

As shown in [16], [20], [23], and [19] the A-capacity of a 

single memoryless cq-channel W is given by 

C[W,\)^ max x(p,l^) VAe(0,l). 

p£V{A) 

The main result of our paper is an analog of the capacity 
formula ([T]i and can be stated as follows. 

Theorem 3.1: Let T be an arbitrary compound cq-channel 
with finite input alphabet A and finite-dimensional output 
Hilbert space Ti. Then 

C(T,A)= max mi x(p,Wt) (6) 

holds for any A e (0, 1). 

Proof: The achievability, i.e. the inequality 

C(T, A) > max inf x(p, W^t) 

follows from Theorem 15. 101 On the other hand, Theorem l5.13l 
shows that we cannot be better than the right hand side of (|6]l 
which establishes the inequality 

C(T,A) < max inf x(p,T^t)- 

p&V{A}teT 



IV. Universal Classical Approximation of the 
Quantum Relative Entropy 

The purpose of this section is the derivation of a universal 
classical approximation of quantum relative entropies of a 
given set C S{Ti.) with respect to a reference state 
cr G S{Ti). The first result of this kind was obtained in the 
paper [14] by Hiai and Petz in the case \Q\ — 1. Basically 
they have shown that for given states p^cr ^ S{Ti) we can 
approximate S{p'^^\\a®^) by the Kullback-Leibler divergence 
of the probability distributions pi and qi given by 

for suitable projections Pi = Pi{l,p,a) e B{H)®^ with 
Yli^liPi — '^u- approximation error does not exceed 
dimTi • \og{l + 1). Precisely, Hiai and Petz have shown that 

> D{pi\\qi) > - dim?^ -loga + 1). 

This approximation result for quantum relative entropy was the 
crucial step for a construction of projections Qn G 
for each n e N with the properties 

1) lim„^o, tr(p®"g„) = 1 and. 



2) limsup„^^ ilogtrK"g„) < 
These properties are exactly the direct part of the quantum 
version of Stein's Lemma. Subsequently, Nagaoka observed 
that these arguments can be reversed, i.e. starting from the 
direct part of Stein's Lemma we can construct a classical 
approximation of the quantum relative entropy by simply 
considering the projections Q„ and 1^" — Qn and proba- 
bihty distributions p„ ^ (tr(p®"Q„), 1 - tr(p®"g„)), g„ = 
(tr(cr®"Q„), 1 - tr((T®"g„) H (cf. our inequahty chain for 
more details). It is an interesting fact that Nagaoka's argument 
produces for each n e N pairs of projections which give rise 
to a good approximation of the quantum relative entropy. 
Our approach to the universal classical approximation is 
motivated by Nagaoka's argument and therefore we need 
a universal version of Stein's Lemma or Sanov's Theorem 
from [3]. Actually we need a slightly sharper result than that 
obtained in [3]. The main tool to obtain this sharpening is 
contained in the following 

Lemma 4.1: Let X be a finite set and r G V{X) with 
r{x) > for all x e X. Then for each S > 0, k £ N, 
and any set ilk C V{X) there is a subset Xk,s C X^ with 

1) q'^^{Xk,5) > 1 - (fc + 1)1-^12-'==*' for all q £ with 
a universal constant c > 0. 

2) 

r®^{Xk,s) < (fc + 1)1-^1 2-'''('°(^^'= 1 

with D{nk\\r) := M^en^ D{q\\r) and ri{d,r) 

— ^ log — (Slogrinin, whcrc rinin denotes the smallest 

positive value of r. 

Proof: The proof uses the well known type bounding 
techniques from [5] and [21] and is therefore omitted. ■ 
A (discrete) projection valued measure (PVM) on a finite 
dimensional Hilbert space /C is a set Ai := {-P;}™ ^ consisting 
of projections Pi G B{IC) such that X]"=i ^ l^c- For two 
states p,a € S{JC) and any PVM on /C we define 

m 

SMiplW) ^tr(pP,)logti-(pP.) -tr(pP,)logtr(aP,) 

2=1 

if HpPimU « (tr(aP,))™ 1 and 

SM{p\\cr) ■= oo 

else. 

Theorem 4.2: Let a G 5(7i) be invertible. Then for each 
I e N there is a real number with limi^oo Ci{'^) — 

such that for any set Qi C S(H) there is a PVM Mi = 

{Pi, ifl -Pi} on H®' with 

SMApV)>KSma)-Q{<j)) 

for all p £ ill with S{ili\\a) :— infpgj^, S{p\\(j). Conse- 
quently, 

inf SMAp'^'\W''')>i{sm'y)-CM)). 

Proof: The proof is based on the following observation: 
Let Ml {PiAh - Pi} be any PVM on H'^^ with the 
properties 

^We learned this from the paper [18] by Ogawa and Hayashi who attribute 
this observation to Nagaoka. 
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1) tr(p®'Pi) > for all p £ fli with linii^oo = 
and 

2) tr(cr®'Pi) < 2-'(^("'ll'^)-^2.i) with linii^ooTaj = 0. 
Then using these relations we can lower-bound Smi for each 
p E ill as follows: First of all, since a is invertible we have 

<oo 

for each p £ fli. Thus, the monotonicity of the relative entropy 
yields 

SM^ip'^'lW^') < Sip'^'Wa^') <oo 

for all p £ ill. Consequently we can lower-bound 
i5'A^,(p®'||cr®0 using the relations 1) and 2): 

^SmApV) > -\H{{ir{p®'PiMp^\lfl~Pi))) 

-tr(p®'F0jlogtr(a®'F0 

> ^^l+t,{p®ipi){S{ni\\a)-T2,i) 



> —^+{l-ri,i){S[ni\\a)-T2,i) 

> S{ni\\a)-(:i{a), (7) 



with 



1 



Clicr) := (1 - Ti,;)r2,; - n^i log Amin(CT) + y, (8) 

where A„iin(CT) denotes the smallest eigenvalue of a. 
Thus our remaining job is the construction of the PVM with 
the properties described above. To this end let / e N and 
ill C S{H) be given. For m S N we can find k,y £ N with 
< y < m such that I — km + y. Then applying exactly the 
same bounding technique as in the proof of Theorem 2 in [3] 
but using our Lemma 14.11 instead of their Lemma 1 we obtain 
for each 5 > a projection Pi,s £ B{H)®^ with 

1) tx{p®^Pi^s) > 1 - (fc + l)'i'"2-'='='^' with a universal 
constant c > and where d = dim(7Y), 

2) 

ilogtr(a«'P.,) < ^Sm\.)+d'^^i^^ 
I m 

,2m , .mNl0g(fc + l) 



km 



with 



77(5,(7) = -51og- - (51ogAinin(cr). 

a 



Choosing m = mi := [log^(Z^/^)] it is easily seen that for 
k = ki = with < yi < mi and Si := l^^/^ we have 



where 
and 

T2 J d 



lim Ti_i = and lim T2_; = 0, 

I — >oo ' / — >oo ' 



(9) 



log(m, + 1) ^ ^^2„., ^ + + ^^(^^^ ^) 



mi 



kimi 



The desired PVM is then given by Mi {Pi, ifl - PJ with 
Pi-=PiM- ■ 
Remark 4.3: An alternative proof of Theorem 14.21 might 
be based on the techniques developed by Hayashi in [10], 
[11]. He constructs there a sequence of PVM's on H^' via 
representation theory of Lie groups which depends merely 
on a and shows how to derive Stein's Lemma. Thus we are 
forced to uniformly bound the errors of the first and second 
kind in Hayashi's setting for the whole family ili in order 
to obtain a universal abelian approximation of the quantum 
relative entropy. 

V. Capacity of Compound CQ-Channels 

Let T be an arbitrary compound channel and for a fixed 
p eV^A) define 



xeA ) 



where each pt £ Hp is seen as a density operator in y^diag 
B(H) with 

Aiag := 0C|x)(a;| 

being the algebra of operators diagonal w.rt. the basis 
of Cl^^d Moreover, for each t £ T v/e set 



In what follows we identify the probability distribution p with 
a diagonal density operator, i.e. we set 



P = ^p{x)\x){x\ £ y^diag 



It is well known that 

S{pt\\p®at) = x{P,Wt) 

holds, where S{pt\\p® Ct) is the relative entropy. 

Lemma 5.1 (Donald's Inequality): Consider any i,t' £ T. 
Then 

S{pt,\\p®at) > S{pt'\\p(E)crt') 

and equality holds iff at> — at. 

Proof: The claimed inequality can be seen as a special 
instance of Donald's identity [7]. We give a short direct proof 
for reader's convenience. If supp(p(') is not dominated by 
supp(p(8) (Tt) we have S{pt'\\p'E) <Jt) — +00. But on the other 
hand S{pt'\\p(^at') = x{p,Wt') < +00 for any t' £ T. Thus 
the claimed inequality is trivially fulfilled and is always strict 
in this case. 

Assume now that supp(pt') is dominated by supp(p ® at), 

^ -4diag has a natural structure of a *-algebra, thus Aa-^g ® B(Ti.) is an 



(10) admissible construction. 
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then we obtain 

S{pt'\\p<»<Tt) = tlipt' log pt, - Pt'\0gp(»<Tt) 

= -S{pt') - ti{pt' log pi» at) 
= -S{pt') + S{p) - ti{at' logcTt) 
= -S{pt') + S{p)-tT{at' log at) 

+tr{at' log at') ~ tr{at' log af) 
= -S{pt') + S{p) + S{at') 

+tr{at' log at' ~ af log at) 
= S{pt'\\p(g)at') + S{at'\\at) 
> Sipt'Wp^at'), 

where we used the fact that S{at'\\at) > in the last line. 
We are done now since S{at'\\at) = iff a^ — at- ■ 
Remark 5.2: A glance at the proof of Lemma 15.11 shows 
that the following stronger conclusion holdfl For any t £ T 
and any state a E S{T-C) 

S{pt'\\p(E)a) > S{pt'\\p(E)at') 

with equality iff a = af. 

For given p E 'P{A) and t E T we set 

SiQpWp^at) inf S{pr\\p^at). 

Lemma 5.3: For each p E 'P{A) we have 

inf S{flp\\p®at') = inf S{pt'\\p ® af). 
f£T t'eT 

Proof: It is clear that inift^r S{np\\p ® af) < 
int f S{pf\\p (E) af) holds. For the reverse inequality we 
choose an arbitrary e > and a t{e) E T with 

S{np\\p(g>ati^,-^) < l^^S{%\\p®at') + ^, (11) 
and d. s{e) eV such that 

s 

S{,Ps(e)\\P® (7t(e)) < S{np\\p(»ati^e)) + 

< ini S{flp\\p<^af) 
t'eT ^ 

+£ (12) 

where the last line follows from (fTTl i. Donald's inequality. 
Lemma ISTTI shows that S{ps(e)\\p <^s{e)) < | b 

(T4(g)), and consequently by (fTZb that 

inf S{pf\\p(E)af) < inf S{np\\p(g)af) +s 
t'eT feT ^ 

holds for every e > 0. This shows our claim. ■ 

A. The Direct Part of the Coding Theorem 

The crucial point in our code construction for the compound 
cq-channels will be following one-shot version of the coding 
theorem which is based on (and is an easy consequence of) 
the ideas developed by Hayashi and Nagaoka in [13]. In order 
to formulate the result properly we need some notation. Let 
W : K ^ S{JC) be any cq-channel with finite input alphabet 
K and finite-dimensional output Hilbert space JC. Let Dk :~ 

"^We would like to thank the Associate Editor for pointing out this 
improvement of Lemma 15.11 



W{k) for all k E K. For any w E V{K) we consider the 
states 

p-.^Y. wik)\k){k\(g)Dk, 

keK 

and w ® a with 

o- = ^ w{k)Dk 

acting on the Hilbert space C'^' (X) /C. Let yBdiag denote the 
set of operators on C'^ ' that are diagonal with respect to the 
orthonormal basis {\k)}keK- 

Theorem 5.4 (Hayashi & Nagaoka [13]): Given any cq- 
channel W : K ^ S{IC) and w E 'P(A') with finite set K 
and finite-dimensional Hilbert space /C. Let P E Bdiag ®B{JC) 
be a projection with 

1) tr(pP) > 1 - A with some A > and 

2) tr((w ® a)P) < 2"^ for some fi > 0. 

Then for each < 7 < /i we can find fci, . . . , A:[2m-7] E K 

and 61, ... , 6[2M-T] E B{]C) with b, > and E1=i"^' < ^k. 
such that 

1^ ^ (1 - tviD.M)) < 2 • A + 4 . 2-'. 

Proof: All arguments needed in the proof of this theorem 
are contained explicitly or implicitly in [13]. We provide the 
proof in Appendix U for completeness and in order to make 
the presentation more self-contained. ■ 
As in the classical approaches to the direct part of the coding 
theorem we need a discrete approximation of our compound 
cq-channel. A partition 11 of 5(7Y) is a family {tti, . . . , Hy} of 
subsets of S{'H) such that tt.^ n tt^ =0 for i 7^ j and 5(7Y) = 
UiLi '^i hold. We say that the diameter of the partition 11 = 
{tti, . . . ,7Ty} of S{T-C) is at most k > if 

sup \\p-a\\i < K \/i^l,...,y. 

p.crei^i 

We borrow from [22] a basic partitioning result for S{Ti) 
which is proven by a packing argument in the -dimensional 
cube. 

Theorem 5.5 (Winter, Lemma IL8 in [22]): For any k > 
there is a partition 11 — {iTi, . . . , tTj,} of S{'H) having diameter 
at most K with y < Kk^"^ , where the number K > depends 
only on the dimension d of Ti. 

Applying this result -times outputs for each k > a 
partition 11 of the set of cq-channels CQ{A,'H) with input 
alphabet A and output Hilbert space TL with at most K^^^ ■ 
i^-\A\d elements. For n e N we choose k ~ k„ :— and a 
partition 11^^^ — {tti „, . . . ,TTy^n} of CQ{A,H) with at most 
jj^l-^l . j^\^\d elements and diameter not exceeding k„. This 
IIk^ produces a partition 

:=K„nT:^ = l,...,2;,7r,,„nr^0}, 

of the given compound cq-channel T. From each Tr^.n nT 7^ 
we select one cq-channel Wt^ and denote this finite set of 
channels by T^. 

Let U : A ^ 'S{'H) denote the useless cq-channel U{x) := 
(l/d) ■ 1h. We set := (1 - ^)Wt + for all t E T/,. 
The resulting set of channels will be denoted by r„. Written 
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in terms of density operators this defining relation means that 
we consider 



(13) 



for all t €Tl^ and all x £ A. 

Lemma 5.6: Let T be any compound cq-channel and 
choose n € N. Then the associated compound cq-channel T„ 
has the following properties: 

1) |T„| < X'^l •nl^l'i'. 

2) For each t G T we can find at least one s G r„ such 
that for all x" G A" 

4 



lA 



7^1 



< 



where || • ||i denotes the trace distance. The same 
statement holds if we reverse the roles of t £ T and 

s e T„. 

3) There is a constant C — C{d) such that for each p e 
ViA) and all n e N 

I min x(p, W'^ - inf xb, < C/n 

holds. 

Proof: The first part of the lemma is clear by our 
construction of T„. 

The second assertion follows from the general fact that for 
states pi, . . . , p„, (Ji, . . . , (T„ e S{l-L) the relation 



llPi 



) p„ - (71 (g) . . . (g) CT„||i < ^\\Pi - CTilll 

i=l 



holds and that for each t £ T we can find s' £ with 

\\Dt,x - Ds'^xWi < for all a; € ^ and to each s' £ 

there is obviously s € T„ with ||Z?s',2^ — ^'s.xHi < for 
all X e A . 

The last part of the lemma is easily deduced from the Fannes 
inequality [8] which states that for any states p,a £ S{Ti.) with 
\\p-<y\\i <S < l/e wehave \S{p)~S{a)\ < Slogd-SlogS. 
Indeed, for each n £ N choose s„ £ with 



X(P, = mm Wt). 

s£i„ 



(14) 



Then observing that 



xeA 



and that we can find t £T with - ^s„,xlli ^ ^/n^ for 

all a; e A leads via Fannes inequality to 

\x{p,Wi) ^ x{P,Wt)\ <2{^logd~ ^\og^), (15) 

provided that n > y^. (fT4l i and ( fTsl l show that 

infvfp, Wf) < minrfp, W') 

+2(4logd- ^log^) 
= mm x(p, W^) + 0(n~^). 

A similar argument shows the reverse inequality and we are 
done. ■ 



Remark 5. 7: At this point we pause for a moment to 
indicate why our discretization Lemma 15.61 does not suffice 
to reduce the capacity problem for arbitrary sets of channels 
to the finite case solved by Datta and Dorlas [6]. Let us 
assume that we want to construct codes for the channel r„ 
of block length n The proof strategy in [6], translated into the 
setting of our Lemma 15.61 would consist of a combination of a 
measurement that detects the branch from r„ combined with 
reliable codes for individual channels from In order to 
detect which channel is in use during the transmission Datta 
and Dorlas construct a sequence x™^" £ A™-^", L„ :— ('"^"'), 
and a PVM in {pT^"}teT^ in 6(7^®™-^") with 

tr(pr''"W^r'^"(a;'"''")) > (1 - |T„|/™)l^"l-\ (16) 

where / £ (0,1). It is easily seen using standard volumetric 
arguments with respect to the Hausdorff measure on the set of 
cq-channels that for open sets T (w.r.t. the relative topology) 
of channels \T„\ > poly(n) with degree strictly larger than 
1. Hence, L„ ~ poly(n). And since the rightmost quantity 
in dTSI l has to approach 1 we have to choose m — m{n) 
as an increasing sequence depending on n. Thus for large n 
TO„L„ = m„poly(n) > n and no more block length is left 
for coding. 

In the course of the proof of Theorem 15.101 we will need 
two probabilistic inequalities which go back to the work of 
Blackwell, Breiman, and Thomasian [4] and Hoeffding [15]. 
Let {Vt}teT be a finite set of stochastic matrices Vt X ^ J 
with finite sets X and J. For r £V{X) we set 



and 



Pt{x,j) r{x)Vt{j\x) {x £X,j £j), 
Qtij) ■■= r{x)Vtij\x). 



Moreover, for each a e N we define the averaged channel 

V :X'' ^ J" by 



the joint input-output distribution 

p'^ix",]") := r'^''ix'')V''{f\x''), 

and 



For each t £T and a e N let 



a ^ \ " „(»a 



and 



^^^(x^ f^)-^logyi^^ 



(17) 



(18) 



where x°- £ X" and £ J". 

Theorem 5.8 (Blackwell, Breiman, Thomasian [4]): With 
the notation introduced in preceding paragraph we have for 

all a, /3 G K 



teT 
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Our proof of Theorem 15. 101 will also require Hoeffding's tail 
inequality: 

Theorem 5.9 (Hoejfding [15]): Let Xi, . . . , Xa be inde- 
pendent real valued random variables such that each Xi takes 
values in the interval [ui, Oi] with probability one, i = 1, . . . , a. 
Then for any r > we have 



^(X,-E(XO)>aT <e s?=i<°' 



and 



J2{X,~E{X,)) < -ar <e 



2^2 



With all these preliminary results we are able now to state and 
prove our main objective: 

Theorem 5.10 (Direct Part): Let T be an arbitrary com- 
pound cq-channel. Then for each A e (0, 1) and any a > 
we can find (n, Af„, A)max-codes with 

— logM„ > max inf x(p, Wt) — a, 
n p£V{A) teT 

for all 71 G N with n > no{a,X). Consequently, for each 

Ae (0,1) 

Proof: Our strategy will be, roughly, to construct 
a "good" projection for the averaged channel = 
ij- l X^teT Theorem l4.2l Theorem l5.8l and Theorem 

15.91 This means that for a suitably chosen input distribution 
p e ViA), the associated state 

= p®"(a;")|a;")(x"| (g) ^ W^^ix") 

and the resulting product of the marginal states 

we will find a projection P„ e {Aa-^^ ® S(7Y))®" with 

1) tr(p(")F„) « 1, and 

2) tr((p®" (g)CT("))P„) ^ 2-"'"f*eTx(p,w't). 

Then we will apply Theorem 15.41 to obtain a good code for 

VF". This code performs well for the compound channel r„ 

since the error probability depends affinely on the channel. 

Finally, by Lemma 15.61 we see that the code obtained in this 

way is also reliable for the original channel T. 

Let p = argmaXp,gp(^)(infigTX(p'; W^t))- We assume 

w.l.o.g. that miti^T x{PiWt) > 0, because otherwise the 

assertion of the theorem is trivially true. 

Our goal is to construct (n, M„, ■|),„ax-codes C„ for the 

approximating channel r„ with 

for all sufficiently large n e N. Then by Lemma 15.61 C„ is 
also an (n, Af„, ^ + -)max-code for the original channel T. 



Choosing n large enough we can ensure that 

proof would be accomplished. 

In what follows we use the abbreviations 



< 



and our 



^P,n ■■= {p't ■■ p't = ^p(a;)|a;>(a;| ® D[ ^,te t„} 



and for t G r„ we write 

xeA 

where p G V{A) is arbitrary. Note that by (T3[ we have for 
each t 

Amin (P (Xl CrO > -Pmin ■ (19) 

Moreover it is clear from the definition of T„ that supp(p() is 
dominated by supp(p (g) a'J for each t, s G Tn and supp(p (g 
(y's) — supp(p) (g) l-H for all s G T„. Now choose any s G 
T„. By the properties of the supports just mentioned we may 
assume w.l.o.g. that p® as is invertible. Then for fixed / G N 
we can find a, 6 G N with n — al + b, < h < I, and 
obtain from Theorem 1421 a PVM Mi = {Pi,i,P2,i} with 
P^d e (^diag ® B{n))®\ i = 1, 2, with 



SM,{p'T\\{P®<r') > KS{n,,^\\p(^a',)-Oip®a'J) 

;)), 

(20) 



where we have used Lemma [ 

Since P,^i G (Aiag ® B{n))®^ for i = 1,2 we can find 
projections {r, j^-ig^; C B{H)^\ i = 1,2, with 

-Pz,z= 51 k')(a;'|<8r,,,z (z = l,2). 

x'£A' 



The relation 



implies 



(1^,,,,^ (g) 1h) 



Pl.l + P2. 



(21) 



For each x G A let {e^; jj^^l'^ be an orthonormal basis 
of the range of ^.i and {e^;! jl^^jjj-^ ^j^j^ an orthonormal 
basis of the range of r2_xi- Then by ( 1211 1 the set ® 
e^i jl^ig^i jLi is orthonormal basis of (C'"^' i^H)^\ and 
we have by definition 

and similarly 

d' 

x'eA' j=tr(rj_^,) + l 

i.e. the PVM Qiis) := |e,,,,)(e,.,,|}^,g^, j^^ 

consisting of one-dimensional projections is a refinement of 
the PVM Ml = {Pi,i, P2,;}- Thus by the monotonicity of the 
relative entropy and ( |20] | we obtain 

SQ,^s){pT\\{P<»<r')>l{^}nx{p,W;)-Ci{p'E>ai)), 

(22) 

for all t E Tn, and consequently 

mm mm5Q,(,)(p'f ||(p®a:)«') > l{mmxip,Wi)-Ci{p)), 

(23) 



g 



where 



Clip) = maxCi(p«)cr^; 



Claim: For the choice I = In = [y/n\ we have 

limO„(p) = 0. (24) 

n — >C30 

Recall from the proof of Theorem 14.21 that 



O„(p<8)0's) = (l-Ti,;„)r2J„(s)-Ti,;„ logAmin(p«'0-^) + -l-, 

where ti ; and r2 ; = T2.;(s) are defined in (|9]l and ( fTOl i. Our 
remaining goal is to prove 



lim maxT2,;„(s) = 0, 

n— *CXD s^Tn ' ^ 



(25) 



and 



lim Ti,i„ max(- log Amin(p (8i CTs)) = 0- (26) 

In order to simplify the notation and streamline the subsequent 
arguments we introduce following terminology: Let (a„)„gN 
and (bn)n&i be two sequences of non-negative reals. We write 
On ~+ bn if lim„^oc) > 0. The validity of the assertions 
([25T l and (|26] | can be easily deduced from ( fT9] l and the facts 
that ki^ i^ff 1, , and fc,„,52^ j^^^. 

For example we have by ( fT9l l 

Pmin 



< Tij„ logAmin(p<X) CTs)) < -Tl,i„ log 

-fc,„5f„(c-o(„«)-^log^) 



n2 



which tends to as n ^ cx) since fci (5P ^+ , Thus, 
t26t is proven. In order to prove (|25j) it suffices to show that 

lim max(-(5;„ log(5;„ - (5/„ log Aminb crl,)) = 0- 

But this is clear from 

max(-^i„ log(5i„ - log Ai„i„ (p (g) cr^ ) ) < -Si^ log(5i„ 



-'5i„ log 



and (5;„ ~+ 7i 

Choose s* e Tn such that 



s = argmm 



(min5s,(,)(p'f (27) 



teT„ 



and consider the corresponding PVM Q;„(s*) = {|a;'")(a;'" |c 
|ea;'r. j)(e^!„ For each i e T„ we define 

Pt{x^",j) := tr(p'f' "|a;'")(a;'"| «) |e^,,.j)(e^,,.j|) 
= p®'"(a;'")tr(i?;^,,Je,,„,,)(e,,„,,|) 



where for each t E the stochastic matrix Vt : A'" ^ 
{!,..., d'"} is given by 

for x'" G e {!,...,£/'"}. By (|27]i, (l23]l, and dH we 

get 

mm I{p^'",Vt) > Limm x{p, W[) - 0„ (p)), (28) 



with lim„^tx) On (-P) — 0- < l28T l implies together with Lemma 
EH that 

i min J(p»'" , Vt) > inf x(p, W^t) " - " 0„ (p)- (29) 

<„ tGT„ t6T n 

This implies that we can find ni(ei) such that 

i min /(p®'" , 14) > i inf xb, Wt) > (30) 

for all 71 > ni(£i). The last inequality in ( |30l l holds by our 
general assumption that inf^gr Wt) > 0. Choose any n > 
ni(ei). Let 



and 



e := <^ 6* e M : < 61 < - inf xip, Wt) 



/„: = min /(p®'", Ft) 



= min min £)(pf llr (g) Os), (31) 

seT„ teT„ 

where r := p^^" and qt{j) ■— J2x^n r{x^")Vt{j\x''") for all 
j G {1, . . . , d'"}. Moreover, in order to simplify our notation, 
we set X A'" and J :— {!,..., d'"} and suppress the 
n-dependence of a and I temporarily. 

Recalling the definition of i° and from ( fTTl ) and ( fTSl ) we 
obtain from Theorem |5J] for a := I„ - 210, /S := W, e Q 

P(«" <In- 210) < ^ 51 ^^('t < In " Z^) + |r„|2-""^. 



(32) 

Our construction of the compound cq-channel T„ implies that 

for all t e Tn,x e X,j e J 

1 



Consequently 
for all j G J, and 



- Hogn^d < log ^''^^}^^ < llogn^d. (33) 

Since is a sum of i.i.d. random variables each of which 
takes values in [— Hog ri^d, / log n^d] by ( l33T l, we can apply 
Theorem 15.91 and obtain 



"tiit < In ~l0)<e iiH^o^n^^r^ 



(34) 



for all t G Tn since /„ < Et(i^) for all t G T„. (O and (O 
show that 



""(j" < - 2;6') < e TSo^^I^ + |T„|2 



(35) 



Thus the set Xa,0 C X" x J° = A''' x {1, . . . , d'}" given by 
Xa^e ■.^{{x\f):i%x\r)>ln~l0}, 

is used to construct an orthogonal projection Pia.e G (-^diag ® 
BiH))®^" defined by 
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where we identify each x°- e X° with a sequence in A'". 
Moreover 

By the definition of set Xa^e the relations 



/"(^a,e) > 1 - e 1^0^^ - |r„|2 



and 



(36) 



(37) 



hold. (|36l ) and (|37] l imply by definition of the projection 

Pla^e e (^diag'»S(H))®''^ that 



tr(p('"^F;a,e) > 1 - e ml^F^ - |T„|2 



(38) 



and 



tr((p®''^ a('"')F,a,e) < 2-'''^^--^'^\ (39) 



where 



IT, 



teT„ 



teT„ 



and 



1 /(»ia 



Since n = aZ + 5, < 6 < ^, we can define a projection 

Pn,e G (^diag ® by 

yield then 

(40) 



and 



tr((p®"(^a("))P„,e) < 2-""(-^"-2'"«) 

(41) 

by ( |29] l where e„ ^ + 0„(p)- Thus for n > 712(6') we 
conclude from ( 1411 1. the fact that lim„^oo £n — 0, and < 
bn < [n^^^] that 

tr((p®" (g) cr("^)P„,e) < 2-"('"f*^^'^(''''^*)-3«). (42) 

Since the states G (^diag ® and cr(") e 

B{Ti,)'^" correspond to the averaged cq-channel VF" — 
lT~f SteT„ ^PPly Theorem 15.41 with 

-a„i„6l 



A = A„ := e i^o-^"")^ + |r„|2- 
^ = ^„ := n(inf Wt) - 30), 

1 = ln = nO 

and end up with a (n, = [2"('"f*eT x(P,H't)-4e)] 
code for the channel W"^ — X^teT where 

a; = 2A„ + 4 • 2-"^ 



By standard arguments we can select a sub-code for VF" with 
Mn > (1/2) • Af^ and maximum error probability A„ < 2A^j. 
We denote this {n,Mn, A„)niax-code by C„. But since 



— V w'7 



ter„ 



it is clear that C„ is a (n, Af„, |r„| A„)inax-code for the 



compound channel r„. We know from our Lemma 15.61 that 

|T„| < isTl'^lnl^l'^". Thus since Z„ = [V^] and a„ = we 
see that 

lim |T„|A„ = 

n — >oo 

and we are done since Af„ > (l/2)[2"('"f'eT x(P,vVt)-4e)] > 
pnCinfteT x(P,H't)-5e)] foj. ^jj sufficiently large n G N. ■ 
Remark 5.11: Note that the error probability of the codes 
constructed in the proof of Theorem 15.101 behaves like 1/n 
asymptotically. This is caused by our choice of t„ as t„ — 
l/v?. So we can achieve a faster decay of the decoding errors 
by using better sequences t„. For example, if we choose r„ = 
2-"'^'" and replace D[ .^ in O by 

D[,^ :=(l_r„)A,. + ^lH 

for all a; e A and t E T,' we obtain, as a careful inspection 
and a painless modification of the arguments applied so far 
show, for each sufficiently small 9 > {n, Mm A„)inax-codes 
for the compound cq-channel T with 

and 

_ ^ ; 

for an appropriate positive constant c{9). 

B. The Strong Converse 

For the proof of the strong converse we simply follow 
Wolfowitz' strategy in [24], [25]. To this end we use Winter's 
result from [23] which is the core of the strong converse for 
the single memoryless cq-channel: 

Theorem 5.12 (Winter [23]): For A G (0, 1) there exists a 
constant K'{\,dimT-C,\A\) such that for every memoryless 
cq-channel {H^"}„gN with finite input alphabet A and finite- 
dimensional output Hilbert space H and every (n, Af„, A)max- 
code with the code words of the same type p e V{A) the 
inequality 



Mn < 2 



n{xip-W)+K' {\,diniH,\A\)^) 



holds. 

The proof of this theorem is implicit in the proof of Theorem 
13 in [23]. 

Theorem 5.13 (Strong Converse): Let A € (0,1). Then 
there is a constant K = X(A, dimTi, |^|) such that for any 
compound cq-channel {T4^"}tgT,nGN and any (n, M„, A)inax- 
code Cn 

- logAf„ < max inf y(p, Wt) + K-^ 

holds. 
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Proof: Wolfowitz' proof of the strong converse [24], [25] 
for the classical compound channel extends mutatis mutandis 
to the cq-case once we have Theorem 15. 121 
We fix n e N and consider any (n, M„, A)inax-code C„ = 
(wi, bi)ff^\. Each code word Ui G v4" induces a type (empirical 
distribution) on ViA) and according to the standard 
type counting lemma (cf. [5]) there are at most {n + l)''^' 
different types. We divide our code C„ into sub-codes Cn.j = 
('"fei^fc)fc=i^ such that the code words of each Cnj belong 
to the same type class, i.e. induce the same type. It is clear 
that the maximum error probabilities of these sub-codes are 
bounded from above by A for all t E T. Since we have a 
uniform bound on error probabilities on each channel in the 
class T we may apply Winter's, Theorem 15.121 and obtain 



Mj < 2 



nixiPj ,Wt)+K' {\.dimH^A\)-i-) 



(43) 



where pj denotes the type of the code words belonging to the 
sub-code C„j . Since the left hand side of (|43] | does not depend 
on t we may conclude that 



< 2 

< 2' 



niinltET xiP, ■Wt)+K' {\.dimH^A\)^) 



i(maXp£,3(A) infteT x(P-Wt)+K' (X,dim'H,\A\)^) 



(44) 



holds. Then, recalling that there are at most {n + l)!"^! sub- 
codes and using ( l44l i we arrive at 

Mn < {n+ l)l^l2"^'°''''P^^('*' inftST x(p,M't)+^'^) 
^ 2"(™axpep(^) inf teT x(p,VVt) + K^) 

with a suitable constant K = K{\, dimH., \A\). ■ 

VI. Averaged Channels 

In this section we extend the results of Datta and Dorlas [6] 
to arbitrary averaged channels whose branches are memoryless 
cq-channels. 

Let (T, S, /i) be a probability space, i.e. T is a set, E is a 
(T-algebra, and /i is a probability measure on S. Moreover we 
consider a memoryless compound cq-channel {M^"}tgT.neN 
with finite input alphabet A and finite-dimensional output 
Hilbert space H. We assume that the branches Wt, t E T, 
depend measurably on t <E T, i.e. we assume that for each 
fixed X £ A the maps T 3 t ^ Dt^x G S{T-l) are measurable. 
We assume here that 5(7i) is endowed with its natural Borel 
(T-algebra. 

The averaged channel W — {M^"}r!gN is defined by the 
following prescription: For any n e N we have a map 
W : A" 3 x" 1-^ Dx^. e where D^^ is the density 

operator uniquely determined by the requirement that for all 
b e 6(7^®") the relation 



holdS 

A code Cn = {x" (i) , bi)ffjl for the averaged channel 

^Note that tr^Dt^x^-b) depends measurably on t since tensor and ordinary 
products of operators are continuous and hence measurable operations. 



{VF"}„gN consists as before of codewords x"{i) £ A" and 
decoding operators b, £ i3(?^)®", 6, > 0, Ylfti h < 1^"- 
The integer Af„ is the size of the code. Achievable rates and 
the capacity C{W) are defined in a similar fashion as for 
memoryless cq-channels. 

We will show in the following two subsections that, in analogy 
to the classical case [2], the weak capacity of W is given by 

C{W)^ sup ess- Mx{p,Wt), (45) 
pev{A) 

where ess— inf denotes the essential infimurr|§ Clearly, 
we cannot expect the strong converse to hold because of 
Ahlswede's [2] counter examples in the classical setting. 

A. The direct part of the Coding Theorem 

We will need some simple properties of the essential in- 
fimum in the proof of the direct part of the coding theorem 
for the averaged channel W . We start with a simple general 
property of the essential infimum: 

Lemma 6.1: Let (T, be a probability space and / : 
T ^ M any measurable function. Let a :— ess— inffg^/- 
Then the. set A := {t € T : f (t) > a} satisfies 

^,iA) = 1. 

Proof: The assertion of the lemma follows easily from 
the definition of the essential infimum. ■ 
Our proof of the direct part of the coding theorem will be 
based on a reduction to the case of compound cq-channels. 
Therefore we have to give another characterization of 

sup ess— inf x{p, Wt) 
pev{A) teT 

in terms of the optimization processes appearing in the ca- 
pacity formula for the compound cq-channels. To this end we 
define for any p E ViA) 



and 



a{p) := ess- inf x{p, Wt), 



Tp:^{tET:x{p,Wt)>aip)}. 



Lemma 6.2: Let {VT^Itign be the averaged cq-channel 
defined by the probability space (T, S, /i) and the compound 
cq-channel T. Then 

sup max inf x{<l,Wt) — sup ess— inf x(g, Wt). 
peV{A)<i&'Pi^)'^^T^p qev{A) 

Proof ^i{Tp) = 1 holds by Lemma O For p,qE V{A) 
and the corresponding sets Tp,Tq <ET we have 

< ess- ixvix{q,Wt), (46) 
ter 

where the last inequality is justified by the observation 
that /i(Tp n Tg) = 1 and that Tp r\ Tq <E {t E T : 
Xiq, Wt) > MteT.nT, Xiq, Wt)}, i.e. fi{{t E T : x(g, Wt) < 

*The essential infimum of a measurable function / : T — > R on the 
probability space (T, S, fi) is defined by ess— inf^g-j^ / := sup{c G R ; 
G T : fit) < c}) = 0}. 
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infigT^nT, x(<Z, W^t)}) = and ( |46] l holds by definition of the 
essential infimum. (|46] l implies that 

max inf x((j', W^t) < sup ess— inf x(q, Wt), 



ije7'(A) teTp 
and consequently 



sup max inf T^t) — sup ess— inf xl*?, W^* 



pev{A) qev{A)teTp 



qev(A) 



teT 



(47) 

In order to show the reverse inequaUty we choose for any 
£ > a e V{A) with 

sup ess— inf x(<Z, Wt) < ess— inf x{Qe,Wt) + e. (48) 
qeV{A) 'e"^ 

By definition of the set Tg^ as 

T,^^^{teT:x{qe,Wt)>a{q,)}, 
with a(g£) = ess— inffgT x(9ei ^t) we have 

ess- inf xlge, Wt) < inf x{qe,Wt). (49) 

The inequalities (l48T l and (|49] l show that 

sup ess- inf xlg, Wt) < inf x(9e, Wt) + e, 

(?G-P(A) *eT tGT,^ 

which in turn yields 

sup ess— inf x(<Z, Wt) < sup max inf x('?jWt) 

+£. 

Since e > can be made arbitrarily small and the left hand 
side of the last inequality does not depend on e we finally 
obtain 

sup ess— inf xl*?, Wt) < sup max inf x('?;Wt), 



q£V(A) 



teT 



pev{A) qeV{A)teTp 



which concludes our proof. ■ 
Theorem 6.3 (Direct Part): Let W denote the averaged cq- 
channel. Then 

C(W) > sup ess— inf x(p, Wf) 



pev{A) 
Proof: We assume that 



teT 



sup ess— inf x(P: Wt) > 
pev{A) teT 

since otherwise the assertion of the theorem is trivially true. 
By Lemma l672l it is enough to show that for each p e ViA) 
with 

max inf xC^j Wt) > 
qeV{A) teTp 



the rate 



max inf Y(q, Wt) — e 
qev{A)teTp^ 



is achievable for each sufficiently small e > 0. But this follows 
immediately if we apply our Theorem 15. 101 to the compound 
channel Tp since any good code for the compound cq-channel 
Tp has the same performance for the averaged channel W" 
due to the fact that fi{Tp) ^ I. ■ 



B. The Weak Converse 

We start with a general property of the essential infimum 
which will help us to reduce the arguments in the proof of the 
weak converse to Fano's inequality and Holevo's bound via 
Markov's inequality. 

Lemma 6.4: Consider a probability space (T, S],/i). Let 
n G N and /, /„ : T ^ M be measurable bounded functions 
with 

lim /„(i) = /(t) yteT. (50) 

n — ^oo 

Let (G„)„gN be a sequence of measurable subsets of T with 
lim n{Gn) = 1- 

n — >-oo 

Then 



lim sup inf fn{t) < ess— inf / 



(51) 



holds. 

Proof: The proof will be accomplished if we can show 
the following two inequalities: 



lim sup inf fnit) < lim sup inf f{t), (52) 



and 



lim sup inf f{t) < ess— inf /. 

n — 'oc ieGn teT 



(53) 



Proof of Set 



bn ■■= inf f{t) and := inf /„(t). 
Then to any e > we can find a E Gn with 

/(ie) (54) 

and, by ( fSOl l. there is n{e) G N such that for all n > n{e) we 
have 

fn{te)<fite)+e. (55) 

Then the definition of b'^, ^5}, and yield 

< K + 2e 
for all n > n{e). This implies 

limsup6'j < lim sup 6„ + 2e, 



and since e > is arbitrary we obtain 
Proof of ( l53l ).- As in the first part of the proof we use the 
abbreviation 

bn := inf f{t), 

and additionally we set 

b := lim sup 6„. 

n — 'oo 

Then by the very basic properties of the upper limit we can 
select a subsequence (r7,i)igN with 



lim bm = b. 



(56) 



In order to keep the notation as simple as possible we will 
denote this induced sequence (6„JigN by (&„)„gN, i-e- we 
simply rename the subsequence. For any fixed n e N we 
consider the sequence {An.k)kef'S consisting of measurable 
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subsets of T defined by An,k ■— Ui=i Gn+i- Note that 
for each n G N the sequence {An.k)kefi has the following 
properties which are easy to check; 

1) An,l C An^2 C . . ., 

2) linife^oo KAn,k) ^ 1, 

3) a,i,k infteA„,fc/(0 = min{6„+i, 6„+2, • ■ ■ , fe«+fe}, 
the sequence {an,k)k£ti is non-increasing for any n E N, 
and 

4) for ^„ := U/ceN^«.fc a„ infteA„f{t) we 
have = 1, a„ < ess— inftgT /, and a„ = 
linife^oo an,k for each n G N. 

In spite of these properties it suffices to prove that for each 
£ > there is n{e) G N such that 

&-e<a„(s),fc < fe + e V/c G N, (57) 
holds. In fact, ( |57l ) implies then that 

b-e < a„(e) < & + e, 

since a„(e) = linife_^oo CLn(e),k and by choosing an appropriate 
sequence (ej)jeN with Sj \ we can conclude that 

b = limsupa„(£^). 

But then b < ess-inffg^/ by a„(£ .) < ess-inffg^/ for all 
J G N. 

Thus we only need to prove (ISTT i which follows from (|56] | 
(with our convention to suppress the index i): To any e > 
we can find by ( |56] | an G N such that for all n > n{e) 
we have 

b - e < bn < b + e. 
Then by property 3) above we obtain for each fc G N 



b-e< 



^{br 



ie) + l, ■ ■ ■ 1 On{e)+kl 



} = an{e)M <b + e, 



which is the desired relation. ■ 
As a last preliminary result we need the generalization of 
Lemma 6 in [4]. 

Lemma 6.5: Let {T4^"}ra6N be a memoryless cq-channel 
with input alphabet A and output Hilbert space H. Then 
for any (n, A/„, £„)av-code C„ — (a;"(i), 6^)^!'^ with distinct 
codewords we have 

(1 - £„) logM„ < nxip., W) + 1, 

where p» = -^^ftiPx^ii) G ^(^) with empirical distri- 
butions or types G ^(^) of the codewords for 
1 = 1,.. . , A/„. 

Proof: The proof is based upon similar arguments as 
that of corresponding Lemma 6 in [4]. The only additional 
argument we need is Holevo's bound. The details are as 
follows; We may assume w. l.o.g. that Y.i=i bz = 1®" and 
define corresponding classical channel by 

K{j\i) := tr(i^,„(,)6,) ^,J G {!,..., M„}. 

Let v G V{A") be given by = if is one of 

i — 1, . . . , Mn, and i/(a;") = else. In what follows we 



consider the marginal distributions i^i, . . . ,1^^ G 'P(^) induced 
by G ViA"). It is obvious that 



1 " 

p^.{a) — — ^ i^j(a) Va G v4 



(58) 



holds. From Fano's inequality and Holevo's bound we obtain 

(1 - £„) logM„ < liiy, K) + l< xii^, + (59) 

where K) denotes the mutual information evaluated for 
the input distribution v and the classical channel K. Using 
the super-additivity (cf. [16]) and concavity (w.rt. the input 
distribution) of the Holevo information we get 

n 

where we have used (ISSl l in the last inequality. Inserting (|60] | 
into (|59] l yields the claimed relation. ■ 
The corresponding weak converse is the content of the next 
theorem. 

Theorem 6.6 (Weak Converse): Let W be the averaged 
channel defined by the probability space (T, and 
the compound channel T. Then any sequence (C„)„gN of 
(n, M„, £„)av/max-codes with lim„^cc> £ri = fulfills 

limsup — logM„ < sup ess— inf x{p, Wt). 

Proof: Let (C„)„gN be any sequence of (n, M„,£„)av- 
codes with lim„_ioo £n = 0, i.e. 

eav(t,C„)^(dt) = £„, 



where 



Set 



-^(i-ti-(A....wfcO)- 



G„:={tGr:eav(i,C„)< V^}. 
Then Markov's inequality yields 



(61) 



(62) 



If we choose ni G N such that ,Je^ < ^ for all n > rii then 
all the code words are distinct and we can apply Lemma 
to each t G Gn (cf. (1611 )) leading to 

(1 - logM„ < nx(p*, Wt) + 1, 

which is equivalent to 



- logM„ < — 1=^-^ 

n 1 - ^/En 



(63) 



for all < G Gn and all n > ni. Since ( |63] l holds for all t G Gn 
we obtain 



?^ 1 - 



(64) 
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Recall that depends on the block length n. Thus we are 
done if we can show that 

limsup max inf x(p, Wt)< sup ess— inf x(p, Wt) 

(65) 

holds. 

For each n e N with n > ni we choose p„ E 'P(^) with 
inf x{Pn,Wt) ^ max inf x{p,Wt). 

By passing to a subsequence if necessary we may assume that 
lim inf x(Pn, Wf) = limsup max inf x(P)T^t)- (66) 

n^ootGG„ n-»oo pe'P(A)t£G„ 

By selecting a further subsequence we can even ensure that 
Mmj^oo Pn ='■ p' G ^(^) due to the compactness of V{A). 
By (l66T l we have 

lim inf x(Pn, , M^O = l™sup max inf W*). 

j^coteG^^ n^oo peP(A)tGG„ 

(67) 

Now, since 

lim x(Pn,,W^t) = x{p',Wt) 

for all t e T by the continuity of Holevo information, 
and since limj^oo /^(G„ ) = 1 by (|62] |. we see that the 
assumptions of Lemma 16.41 are fulfilled for the functions 

fj{t) xiPn^Wt) and f{t) := x{P, Wt). 
Thus Lemma 16741 and (|67] | show that 

limsup max inf xCPj^*) ^ ess— inf x(p', W^t) 

< sup ess— inf x(p, Wt). 
pev{A) 

This is exactly i65[ and we are done. ■ 

VII. Conclusion 

In this paper we have shown the existence of universally 
"good" classical-quantum codes for two particularly inter- 
esting cq-channel models with limited channel knowledge. 
We determined the optimal transmission rates for the classes 
of compound and averaged cq-channels. For the first model 
we could prove the strong converse for the maximum error 
criterion whereas for the latter only a weak converse is 
established. 

The coding theorems for compound and averaged cq-channels 
imply in an obvious way the corrsponding capacity formulas 
for the classical product state capacities of compound and 
averaged quantum channels (cf. the arguments in [16], [20], 
[23] for memoryless quantum channels). To be specific the 
classical product state capacity of a family {Aft : B{Ti.') 
B{Ti,)}t^T of quantum channels, as described by completely 
positive, trace preserving maps, is given, according to our 
results, by 

CiiWtjter)^ sup inf x(te,M(A)}), 

{p,,I3,} *ST 



where the supremum is taken over all ensembles {pi,Di} 
of possible input states Di G S{Ti') occurring according to 
probability distribution (p^), and 

x(fe,M(A)}) := S ij2p,Nt{D,)) - Y,p,S{Nt{D,)). 
The full classical capacity of {Nt]teT is then 

C{{Mt}teT) = lim ici({AA®"}teT), 

and the limit is in general necessary by a counterexample to 
the additivity conjecture given by Hastings [9]. 
The capacity results for compound and averaged cq-channels 
show nicely the impact of the degree of channel uncertainty 
on the capacity. In fact, for the compound cq-channel we 
merely know that the information transmission happens over 
an unknown memoryless cq-channel which belongs to an a 
priori given set of channels. The capacity formula (|6]l is 
the best worst-case rate we can guarantee simultaneously 
for all involved channels. For averaged cq-channels, on the 
other hand, the formula (|45] l takes into account only the 
almost sure worst-case cq-channel, since we are given an 
additional information represented by the probability measure 
on the memoryless branches. Consequently, the capacity of 
compound-cq-channels is smaller than the capacity of their 
averaged counterparts in many natural situations. A simple 
example illustrating this effect is as follows. 
Let T := {1, . . . , K} be a finite set and let Wi, . . . , Wk ■ 
{0, 1} 5(C^) be cq-channels that defined as follows. 
Let Wi be any channel with the capacity C{Wi) = 0. For 
j € {2,...,K} select distinct unitaries U2,.-.,Uk acting 
on C2 and define Wj{b) := Uj\eb){eb\U* where b e {0,1}, 
j e {2, . . . , K} and eo, ei is the canonical basis of C^. Note 
that for each p g P({0, 1}) and j e {2, . . . , if } 

x{p,Wj)^Hip) 

holds, and consequently C{W2) = ... = C{Wk) = 1- 
Since any sequence of codes with asymptotically vanishing 
probability of error for the compound cq-channel T has to be 
reliable for each of our channels Wi , . . . , Wk and especially 
for Wi, we see that the only achievable rate for T is 0. 
Consequently C{T) — 0. Now, if both the transmitter and 
receiver have additional information that the channels from 
T are drawn according to a priori probability distribution 
= and — for i E {2, . . . , K} then it follows 
from Theorem 16.31 that 

C{W) > sup ess— inf x(p, Wf) 
pevilOA}) 

= sup min x(Pi Wt) 
peP({04})*6{2,...,if} 

= sup H(ji) 

pevi{o,i}) 

= 1, 

where W denotes the averaged channel associated with T and 
II. 
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Appendix I 
Proof of Theorem 15. 41 

This appendix is devoted to the proof of Theorem 15.41 We 
will apply a random coding argument of Hayashi and Nagaoka 
which in turn is based on the following operator inequality 
which we quote from the work [13] by Hayashi and Nagaoka: 

Theorem 1.1 (Hayashi & Nagaoka [13]): Let /C be 
a finite-dimensional Hilbert space. For any operators 
a,b€ B{IC) with < a < 1 and 6 > 0, we have 

1 - Va + b^^aVa + b^^ < 2(1 - a) + Ab, (68) 

where (•)^^ denotes the generalized inverse. 

Let us first note that our projection P e Sdiag B{JC) can be 

uniquely written as 



with suitable projections Pk E B{JC) for all k E K. With this 
representation we have 

^(PP) = E w{k)tT{DkPk), (69) 

keK 



and 



tr((u; ® a)P) = w{k)ti{aPu). (70) 

Now let us set M :— [2^^'''] and consider i.i.d. random 
variables Ui,...,Um with values in K each of which is 
distributed according to w e Pi^)- Moreover we set 



M 



-1/2 / \ -1/2 



b,{Uu...,UM):^ \J2 j Pu. 
Applying Lemma [TTI we obtain 



(71) 



M 



\k - h{Ui, ...,Um)< 2(1k; - Pu. ) + 4^ Py^. . (72) 



i=i 



In the following consideration we use the shorthand e{U) 
for the average error probability of the random code 
iU„b,{Ui,...,UM))fii, i.e. we set 

1 

e(^) ^= M E - hiUi, . . . Um))). 



Recalling the fact that Ui, . . . , Um are i.i.d. each distributed 
according to w and ( |72l ) yields 

2 

Eu,,...,u^{e{U)) < -J2J2nj{kMDk{l^-Pk)) 

i=l kGK 



4(M- 1)M 

M 



E w{kM<jPk) 



keK 

< 2tr(p(l-P))+4-M-tr((u;(g)cr)P) 

< 2-A + 4-2-T, (73) 

where we have used (|69l ) and dTOb in the second inequality. 
jTSl l shows that there must be at least one deterministic 
code {ki,bi)f^^, which is a realization of the random code 
{Ui, bi{Ui, . . . , UM))iLi^ with average error probability less 
than 2 • A + 4 • 2^''^ which concludes the proof of Theorem l5.4l 
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